Meta-MapReduce: A Technique for Reducing Communication in MapReduce Computations

نویسندگان

  • Foto N. Afrati
  • Shlomi Dolev
  • Shantanu Sharma
  • Jeffrey D. Ullman
چکیده

MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is the next challenge where MapReduce should be modified to avoid (big) data migration across remote (cloud) sites. This is exactly our scope of research, where only the very essential data for obtaining the result is transmitted, reducing communication, processing and preserving data privacy as much as possible. In this work, we propose an algorithmic technique for MapReduce algorithms, called Meta-MapReduce, that decreases the communication cost by allowing us to process and move metadata to clouds and from the map phase to reduce phase. In Meta-MapReduce, the reduce phase fetches only the required data at required iterations, which in turn, assists in preserving the data privacy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dominance properties for Divisible MapReduce Computations

In this paper we analyze MapReduce distributed computations as divisible load scheduling problem. The two operations of mapping and reducing can be understood as two divisible applications with precedence constraints. A divisible load model is proposed, and schedule dominance properties are analyzed. We investigate dominant schedule structures for MapReduce computations. To our best knowledge t...

متن کامل

Private and Secure Secret Shared MapReduce

Data outsourcing allows data owners to keep their data in public clouds. However, public clouds do not ensure the privacy of data and computations. One fundamental and useful framework for processing data in a distributed fashion is MapReduce. In this paper, we investigate and present techniques for executing MapReduce computations in the public cloud while preserving privacy. Specifically, we ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Security and Privacy Aspects in MapReduce on Clouds: A Survey

MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern ma...

متن کامل

Google's MapReduce programming model - Revisited

Google’s MapReduce programming model serves for processing and generating large data sets in a massively parallel manner (subject to a suitable implementation of the model). We deliver the first rigorous description of the model. To this end, we reverse-engineer the seminal MapReduce paper and we capture our observations, assumptions and recommendations as an executable specification. We also i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1508.01171  شماره 

صفحات  -

تاریخ انتشار 2015